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Abstract — A combinatorial model of molecular 
conformational space that was previously devel- 
opped [1], had the drawback that structures could 
not be properly embedded beacause it lacked ex- 
plicit rotational symmetry. The problem can be 
circumvented by sorting the elementary 3D com- 
ponents of a molecular system into a finite set of 
classes that can be separately embedded. This 
also opens up the possibility of encoding the dy- 
namical states into a graph structure. 
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I. Introduction 

In a previous paper [1] it was presented a combinato- 
rial model of molecular confomational space (thereafter 
refered as CS), it was shown that it could be described 
with a fair degree of accuracy by a central arrangement of 
hyperplanes 1 that partitions the space into a set of cells. 
The arrangement was defined such that, for a molecule 
of N atoms, the 3-dimensional (3D) conformations in 
a cell all have the same dominance sign vector: for a 
given vector p e M. N there is an associated dominance sign 



vector V(p) = (du, <2i3, — , d p -2, f 
nents are defined as follows 



whose compo- 



dij — 



Pi < Pj 
Pi = Pj 
Pi > Pj 



1 < i < j < N (1) 



There is a set of three dominance sign vectors per 3D 
conformation one for each coordinate: the partition is 
actually a product of three partitions [1] . 

A central concept for the combinatorial study of an 
hyperplane arrangement is the face lattice poset [2]: the 
cells in the induced decomposition of R 3JV ~ 3 ordered by 
inclusion. It is this hierarchical structure that enables 
us to manage the sheer complexity of CS since with the 
simple codes (1) we can encompass from broad regions 
down to single cells. 

The model takes into account two basic symmetries 
of CS: 

1) The translation symmetry: for a molecule with N 



atoms the model is build in a (3N — 3)-dimensional sub- 
space, since for each x, y or z coordinates the dimension 
parallel to the vector (1, 1, 1, ...) contains conformations 
that are obtained by translation along the axis. 

2) The scaling symmetry: the points lying on a half- 
line starting at the origin result from multiplying the 
coordinates of a given 3D conformation by an arbitrary 
positive factor. It reflects the fact that the unit length 
in our system can be arbitrarily defined. 

The model however fails to incorporate the all im- 
portant rotation symmetry, this is due to the fact that 
combinatorial approaches like ours apply mostly to lin- 
ear systems. This greatly complicates the embbeding of 
3D conformations. 

To circumvent this problem, the approach we explore 
in the present communication is how conformations can 
be embedded in CS starting from its elementary building 
blocks, and the most elementary component 3D struc- 
ture is a simplex 2 . Many structural patterns in molecu- 
lar systems can be decomposed into simplexes [3]. 

As we shall see below embedding in CS just a simplex 
is not simple, but this approach leads us to study a set of 
combinatorial structures that offer, beyond the embed- 
ding problem, the interesting possibility of encoding the 
dynamical states of a molecular system. 

II. Embedding a simplex 

V2 




Fig. 1. Graph of a simplex, with the vectors along the edges ori- 
ented as to make the graph acyclic. If we assume that v\ lies above 
the plane of the figure, it corresponds to a right-handed simplex. 

From the simplex of Fig. 1 we define the following 



x the term central means that all the hyperplanes pass through the origin. 
2 A three-dimensional polytope with four vertices. 



set of vectors and their associated central planes: 

eii =Vi-v h £»(x) = {i e I 3 : dj.x = 0} (2) 

for 1 < i < j < 4. These six planes generate a partition 
of 3D space into 24 cells [4] (see Fig. 2): each plane 
divides the space into positive and negative hemispaces 
and a zero space in between 

£+(x) = {x e R 3 : e Vy x > 0} and 

= {x e R 3 : ey.z < 0} (3) 

a 3-D cell results from the intersection of six hemispaces, 
thus it can be unambiguously characterized by the signs 
of these six hemispaces (Fig. 2). 

It is easy to see that the dominance sign vectors for 
an arbitrary 3D-reference system centered at the origin 
can be obtained from this partition: consider for instance 
the z-axis and suppose that z.e\ 2 > 0, this means that z 
is in a cell where the e\ 2 component of the sign vector is 
+, which in turn implies that v\ z > v 2z . 

Thus the dominance sign vector for each coordinate 
will be the sign vector of the cell that contains its positive 
semi-axis. 




— +++ — ++- 



the first four are the vectors perpendicular to the faces 
of the simplex, the last three are perpendicular to pairs 
of non-adjacent edges. As in (2) they have a set of asso- 
ciated central planes 

-r-0 -r-0 -r-0 -r-0 -r-0 -r-0 -r-0 (t?\ 
•> 1231^ 1241^ 1341-^ 234; 12,34: •> 13,24) •> 14.23 \°) 

J*(x) = {x e R 3 : f a .x = 0} 

The corresponding sign vectors are the rows in the ma- 
trix below 
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(6) 



The zeros in the matrix correspond to the planes that 
intersect the corresponding ID cell, this means that the 
cells encoded by /123 and /12.34, for instance, are sour- 
rounded by six and four 3D cells respectively. 

The zeros in the sign vector of lower dimensional cells 
can be seen as a sort of wildcard: they match the sign 
vectors of all the adjacent cells. The converse is also true: 
a sign vector from a 3-D cell can be obtained by adding 
up the sign vectors from the adjacent lower dimensional 
cells [4] . As an example, for the lower left cell of Fig. 2 

we have ( + ++) = (-00 + +0) + (--00 + +) + ( 

-000) = -SIGN(f 134 ) + SIGN(f 14 , 23 ) - SIGN(f 234 ). 

III. Embedding a simplex 

There is still another set of sign vectors that will be 
most useful in characterizing the geometric properties of 
simplexes: these are the signs of the scalar products of 
the vectors (2) and (4) between them. 

Let us assume that we have a particular right-handed 
simplex whose set of signs is 



Fig. 2. Tope graph of the partition (2), each node corresponds to 
a 3-D cell and the edges represent the planes separating the cells. 
For each node the corresponding sign vector is annotated on the 
right. 

Lower dimensional cells occur for vectors that lie in 
one or more of the planes £ij , in that case the correspond- 
ing components of the sign vector are zero. 

The 1-dimensional (1-D) cells are rays starting at the f 123 
origin and running parallel to the vectors f 124 



/123 — e i2 A e 23 , /124 — ei2 A e 2 4, /134 — ei3 A e 3 4, 
/234 = e23 A e 3 4, /i2,34 = &12 A e 34 , /13.24 = ei 3 A e 2 4, 
/i4,23 = ei4Ae 23 (4) / / 2 ' 34 
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(7a) 



/l34 + - + - (7b) 

/234 



13,24 



The set of sign vectors (7a) refers mostly to the angles 
between adjacent edges while (7b) are mostly related to 
dihedral angles between contiguous faces: +, and — 
are for acute, right and obtuse angles respectively. Thus 
(7) gives us a rough outline of the geometry of a simplex, 
and allows a classification of simplexes. 

Next we are going to proceed to embed our simplex 
in CS for the particular case where the z-axis runs par- 
alell to /123. The reason for this special choice will be 
explained below. 
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Fig. 3. View from above of 3^23' * ne pl ane appears divided in a 
number of sectors, separating sectors are rays running along vectors 
whose denomination appears at the outer extreme, vector names 
bearing a ' are projections: x' = x — /i23(z./l23)/||/l23 1|. Rays or- 
thogonal to vectors e-j always correspond to intersections with the 
planes £?• . In the inner layer each sector harbors its corresponding 
sign vector, in the outer layer are the sign vectors corresponding to 
the ID cells. Sign vectors should be read in the outward direction. 
The x and y axis are depicted in a random position. 

From Fig. 3 we can see that J-^ 23 ^ as been parti- 
tioned into 24 cells, which are delimited by lines along 
the projections of vectors ey and from the intersections 
with planes £y (2): thus each projected e-j vector is per- 
pendicular to the intersection of the corresponding £fl 
and it divides the plane into two signed moities as in (2) 
and (3), so the sign vector associated with each cell in 
Fig. 3 becomes obvious, by construction they are the 
dominance sign vectors associated to a coordinate axis 
that is in the cell. Also by construction, the sixth cell 
after/before the one under consideration is orthogonal to 
it. Thus by rotating the x and y axis around z in Fig. 3 



we can scan the complete set of dominance sign vectors 
that arise for this particular situation. Before proceeding 
further let us explain how Fig. 3 can be obtained from 
(6) and (7). 

By construction, see Fig. 1 and (4), ei2, ei3 and e23 
are in circular counter-clockwise order, with the last two 
vectors in the negative hemispaces of ei2 (7a) and /124 
(6). For the remaining projections: 

ei4 lies above T± 2 3 (6)> by (7b) the same is true 
of /134 and /i4,23, as ei4 is contained in the plane per- 
pendicular to these vectors e' 14 will be located inside the 
sector determined by ei3 and e23- 

^24) 624 is above J\2S (6) as well as /124 and /234 
(7b), using the same argument as above ei 4 will be lo- 
cated inside the sector determined by -ei2 and e23- 

634) 634 lies in the h sector relative to ei2, ei3 

e23 and e24 (7a), this squeezes e 34 between £ ± 2 and £24'- 

IV. The circular order of the projected 

VECTORS 




++000++ 



Fig. 4. Tope graph of the partition (4) showing the sign vectors 
of ID cells. As in Fig. 2 the graph is planar, sign vectors of 3D 
cells can be obtained by adding the sign vectors around each node. 
Notice that the columns in (6), or their centrally symmetric sign 
vectors, all correspond to sign vectors above. 

The circular order of the projected vectors in Fig. 3 
can be obtained from the signs of the first row in (7b): 
one can see from Fig. 3 that the shortest path between 
ei2 and e 34 runs clokwise, while the shortest path from 
e23 to e 24 and e 34 runs counterclockwise. This is simply 
due to the fact that e 2 3, e 24 and e 34 , for instance, are 
contained in -7-234 and the angle between e23 and e3 4 can 
not exceed 2tt, thus their circular order in the projection 
depends on wether /234 lies above or below J^ 23 - 

To obtain the sign vector encoding the circular order 
in the plane perpendicular to a vector in general posi- 
tion, it suffices to look wether the given vector is in the 



positive or negative hemispace relative to the planes (5). 
Thus the central arrangement generated by (4) parti- 
tions the space in 32 cells that are represented by the 
tope graph of Fig. 4. 

This settles the problem of the circular ordering of 
the projected vectors which is completely determined by 
(6). 

Last but not least, it is indispensable for the correct 
simultaneous allocation of the x and y dominance sign 
vectors, to determine the relative positions in the circu- 
lar ordering between the e^s and the intersections of the 
figs. 

As this is not a linear problem in some cases it can 
only be partially resolved by (7). Ambiguities can arise 
when building a projection, for instance: if in the exam- 
ple of Fig. 3 e' 2 4 and the intersection of £® A both fell in 
the same sector. In that case we would have to split the 
diagram into two alternative ones. 

V. The enumeration of minimal vectors 

In the diagram from Fig. 3 the dominance sign vector 
associated with the {x, y, z} reference frame is 

((+ +),(+ + ),(00 + + +)), 

one can notice that it is squeezed between 

((+ -+), (+ + ), (00 + + +)) and 

((+ +),(+0 ),(00 + + +)). 

The importance of this diagram is that it enumer- 
ates all the lower dimensional 2D cells associated with 
the sign vector (00 + + +), and since the rows of (6) 
are all the ID cells in the partition constructing a dia- 
gram like the one in Fig. 3 for every row in (6) allows 
us to enumerate all the minimal vectors in our system 
(those bearing a maximum number of zeros), all other 
sign vectors being combinations of them. 

VI. The general embedding problem 

In molecular dynamics simulations atoms are rep- 
resented by pointlike structures surrounded by a force 
field, thus any four atoms in a molecular structure can 
form a simplex. If an order relation has been defined 
between the atoms of the system, then vectors (2) and 
(5) can be defined too for every simplex with the node 
numbers of Fig. 1 representing the order of the atoms. 

Some of the vectors (2) and (5) are shared between 
simplexes through common edges and faces, as a conse- 
quence orienting a simplex restricts the range of avail- 



S a and S b arc connected if there exists a sequence of simplexes 
Si and Si+i are adjacent. 



able orientations in the other simplexes. Embedding a 
3D conformation in CS can be done with this simple 
algorithm: 

1) take a set of connected simplexes 3 such that every 
pair of atoms in the structure is at least in one simplex, 

2) choose a simplex with a non empty set of available 
sign vectors, otherwise terminate the procedure, 

3) select one orientation and restrict the available ori- 
entations in the other simplexes to the ones compatible 
with this choice. Repeat step 2. 

VII. Conclusion 

The two main results of this communication are 

1) simplexes can be put into a number of discrete 
classes, not taking into account handedness we have: 258 
and 816 sets for (7a) and (7b) respectively, with both 
combined we have a total of 3936 classes. 

2) these classes are related to cells in CS, thus relat- 
ing the binary sets (7) in a molecular conformation to 
3D coordinates. 

Embedding just one 3D conformation is not an inter- 
esting issue, what really matters is embedding the vol- 
ume occupied by a molecular system. 

Beyond the embedding problem the results above of- 
fer the possibility of building a structure encoding the 
dynamical states of a molecule. This can be seen by an- 
alyzing the dynamical activity of all simplexes in a typ- 
ical molecular dynamics simulation like the one studied 
in [5], we find that 

1) 90% of the simplexes evolve within less than 20 
classes, 

2) 0.4% remain in a single class for the duration of 
the simulation, form a connected set and comprise 95% 
of the dominance relations (1), 

3) the most dynamically active simplex spans a range 
of 171 classes, slightly less than 2% of the total. 

A connected set of simplexes can obviously be rep- 
resented by a graph, where each node can be split into 
a number of classes (7): its dynamical states. The con- 
nectivity between the sets (7) is an issue than has not 
been explored in this paper, but its determination will al- 
low the connexion between dynamical states in adjacent 
nodes thus generating the derived graph of the molecular 
system dynamical states. 

This graph is a subject for further research. 
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